Background

We have acquired three-dimensional structured illumination microscopy images of nuclear pore complexes. The individual NPCs are segmented and their locations (x/y/z) are extracted using ImageJ plugins/macros. From this data, we want to:

In a separate R analysis workflow, we have developed a pipeline to reconstruct the surface by computing the 3D convex hull of the point cloud. This step also allows for optimization of the hull to remove outlier points. Once the optimization is complete, we want to quantitatively assess the distribution of the optimized point cloud. We are particularly interested in the extent of clustering of points, and whether certain conditions display more clustering than others. We will compare the clustering observed in our datasets to similar metrics from simulated data representing complete spatial randomness on the surface of a sphere.

Example data

Here are maximum intensity projections for a wild-type nucleus, and a representative nucleus from Nup132∆ background that displays increased clustering.

Nsp1-GFP WT

Nsp1-GFP WT

Nsp1-mCh Nup132∆

Nsp1-mCh Nup132∆

For simulated random data, we can simulate points on a sphere using the icosa package:

simulated <- data.frame(icosa::rpsphere(131, radius=1.2))

Then we can plot these points and visualize them with their reconstructed 3D convex hull surface:

## convexhull of points and visualization
NPC.plot <- function(sim, alpha = 0.4, ...) {
    sim <- as.data.frame(sim)
    x <- sim$x
    y <- sim$y
    z <- sim$z
    plot3d(x, y, z, col="blue", box = FALSE,
           type ="p", size = 5, aspect = "iso", axes = FALSE)
    ts.surf1 <- t(convhulln(sim))  # see the qhull documentations for the options
    convex1 <-  rgl.triangles(sim[ts.surf1,1],sim[ts.surf1,2],sim[ts.surf1,3],col="gold2",alpha= alpha)
}
NPC.plot(simulated)
rglwidget()

For comparison, we can do the same plots for the points extracted from the WT and Nup132∆ images.

#read in the datasets selecting the xyz coordinate columns
wt <- fread("./data/Nsp1-GFP_Ppc89-mCherry_02_visit_1_SIR_nuc5.xls", select=c("x", "y", "z"))
mut <- fread("./data/Nsp1mch_deltaNup132cl10_07_visit_2_SIR_nuc15.xls", select=c("x", "y", "z"))

#convert the data into micron space
NPC_pix2microns <-function(dataset) {
  dataset %>% 
    mutate(x = x*0.0400015,
           y = y*0.0400015,
           z = z*0.125)
}

wt <- NPC_pix2microns(wt)
mut <- NPC_pix2microns(mut)

NPC.plot(wt)
rglwidget()
NPC.plot(mut)
rglwidget()

Clustering analysis

Now, we want to analyze these data sets using DBSCAN. First, load in the required package.

library(dbscan)

Analysis of points using DBSCAN requires user input of two values: eps, or epsilon, which is the distance threshold which two points must be within to be classified as directly density-reachable, or in the same neighborhood; and minPts, which is the minimum number of points that a neighborhood must contain to be considered a cluster. The value of eps is user defined, while the value of minPts is typically set as the number of dimensions in your dataset plus 1 (so, minPts=4 for 3D point clouds).

For defining the value for eps, a common approach is to plot the points’ kth nearest neighbor distances in decreasing order and look for a “knee” in the plot. This is accomplished using the function kNNdistplot in the dbscan package:

#the default minPts value is used for 'k'
kNNdistplot(simulated, k=4)

kNNdistplot(wt, k=4)

kNNdistplot(mut, k=4)

We see that the values for “knees” in the 4-NN distances are variable between simulated, WT and mutant data sets. We will use some knowledge of our imaging and analysis approaches, as well as the known size of the NPCs to define a distance that we would consider NPCs to be clustered.

The resolution of our imaging approach is larger than the actual diameter of the yeast NPC (~100nm), and our track max not mask is performed with a minimum distance between points of 160nm. In vivo, the smallest distance between two NPCs would be 100 nm. So, for starters, let’s say that we define a cluster of NPCs as at least 3 NPCs with a minimum eps distance value of 200 nm. Let’s see what that looks like.

clust.plot <- function(sim, dbclust, alpha = 0.2, main=NULL, ...) {
    sim <- as.data.frame(sim)
    x <- sim$x
    y <- sim$y
    z <- sim$z
    plot3d(x, y, z, col=dbclust$cluster+1L, box = FALSE,
           type ="p", size = 6, aspect = "iso", axes = FALSE, main=main, sub=paste0("Clusters: ", max(dbclust$cluster)))
    ts.surf1 <- t(convhulln(sim))  # see the qhull documentations for the options
    convex1 <-  rgl.triangles(sim[ts.surf1,1],sim[ts.surf1,2],sim[ts.surf1,3],col="grey",alpha= alpha)
}
#run clustering analysis with dbscan, eps = 0.2, minPts=3
sim <- dbscan(simulated, eps=0.2, minPts=3)
wtdb <- dbscan(wt, eps=0.2, minPts=3)
mutdb <- dbscan(mut, eps=0.2, minPts=3)

sim
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 13 cluster(s) and 81 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 
## 81  6  3  6  3  3  5  3  5  4  3  3  3  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(simulated, sim)
rglwidget()
wtdb
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 2 cluster(s) and 125 noise points.
## 
##   0   1   2 
## 125   3   3 
## 
## Available fields: cluster, eps, minPts
clust.plot(wt, wtdb)
rglwidget()
mutdb
## DBSCAN clustering for 66 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 1 cluster(s) and 58 noise points.
## 
##  0  1 
## 58  8 
## 
## Available fields: cluster, eps, minPts
clust.plot(mut, mutdb)
rglwidget()

Now let’s try 225, 250, 275 and 300 epsilon values for comparison.

#run clustering analysis with dbscan, eps = 0.2, minPts=3
sim <- dbscan(simulated, eps=0.225, minPts=3)
wtdb <- dbscan(wt, eps=0.225, minPts=3)
mutdb <- dbscan(mut, eps=0.225, minPts=3)

sim
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.225, minPts = 3
## The clustering contains 18 cluster(s) and 60 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 
## 60  3  6  5  7  3  3  3  6  3  5  4  3  3  3  3  4  3  4 
## 
## Available fields: cluster, eps, minPts
clust.plot(simulated, sim)
rglwidget()
wtdb
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.225, minPts = 3
## The clustering contains 10 cluster(s) and 89 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 
## 89 11  4  4  3  3  4  3  4  3  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(wt, wtdb)
rglwidget()
mutdb
## DBSCAN clustering for 66 objects.
## Parameters: eps = 0.225, minPts = 3
## The clustering contains 3 cluster(s) and 47 noise points.
## 
##  0  1  2  3 
## 47  9  6  4 
## 
## Available fields: cluster, eps, minPts
clust.plot(mut, mutdb)
rglwidget()
#run clustering analysis with dbscan, eps = 0.2, minPts=3
sim <- dbscan(simulated, eps=0.25, minPts=3)
wtdb <- dbscan(wt, eps=0.25, minPts=3)
mutdb <- dbscan(mut, eps=0.25, minPts=3)

sim
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.25, minPts = 3
## The clustering contains 17 cluster(s) and 50 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
## 50  8  6  6 10  4  3  6  5  4  3  5  4  3  3  5  3  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(simulated, sim)
rglwidget()
wtdb
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.25, minPts = 3
## The clustering contains 12 cluster(s) and 77 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 
## 77 12  6  4  3  3  6  4  3  4  3  3  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(wt, wtdb)
rglwidget()
mutdb
## DBSCAN clustering for 66 objects.
## Parameters: eps = 0.25, minPts = 3
## The clustering contains 3 cluster(s) and 46 noise points.
## 
##  0  1  2  3 
## 46  9  6  5 
## 
## Available fields: cluster, eps, minPts
clust.plot(mut, mutdb)
rglwidget()
#run clustering analysis with dbscan, eps = 0.2, minPts=3
sim <- dbscan(simulated, eps=0.275, minPts=3)
wtdb <- dbscan(wt, eps=0.275, minPts=3)
mutdb <- dbscan(mut, eps=0.275, minPts=3)

sim
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.275, minPts = 3
## The clustering contains 17 cluster(s) and 43 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
## 43  8  6  6 11  5  4  4  3  6  5  6  3  5  4  3  3  6 
## 
## Available fields: cluster, eps, minPts
clust.plot(simulated, sim)
rglwidget()
wtdb
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.275, minPts = 3
## The clustering contains 14 cluster(s) and 63 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 
## 63 19  4  6  4  6  4  3  3  3  4  3  3  3  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(wt, wtdb)
rglwidget()
mutdb
## DBSCAN clustering for 66 objects.
## Parameters: eps = 0.275, minPts = 3
## The clustering contains 3 cluster(s) and 42 noise points.
## 
##  0  1  2  3 
## 42 15  6  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(mut, mutdb)
rglwidget()
#run clustering analysis with dbscan, eps = 0.2, minPts=3
sim <- dbscan(simulated, eps=0.3, minPts=3)
wtdb <- dbscan(wt, eps=0.3, minPts=3)
mutdb <- dbscan(mut, eps=0.3, minPts=3)

sim
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.3, minPts = 3
## The clustering contains 15 cluster(s) and 32 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 
## 32 14 12  4 11 11  5  4  3  6  6  4  5  3  3  8 
## 
## Available fields: cluster, eps, minPts
clust.plot(simulated, sim)
rglwidget()
wtdb
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.3, minPts = 3
## The clustering contains 15 cluster(s) and 36 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 
## 36 25  9  4 10  4  7  4  4  4  4  3  4  3  7  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(wt, wtdb)
rglwidget()
mutdb
## DBSCAN clustering for 66 objects.
## Parameters: eps = 0.3, minPts = 3
## The clustering contains 3 cluster(s) and 39 noise points.
## 
##  0  1  2  3 
## 39  8 16  3 
## 
## Available fields: cluster, eps, minPts
clust.plot(mut, mutdb)
rglwidget()

Let’s see how things change if we use the DBSCAN* settings, which says that border points (i.e., the point is not a core point, with at least minPts within eps, but is within eps of N-number points with N < minPts).

simbp <- dbscan(simulated, eps=0.2, minPts=3, borderPoints=FALSE)
wtdbbp <- dbscan(wt, eps=0.2, minPts=3, borderPoints=FALSE)
mutdbbp <- dbscan(mut, eps=0.2, minPts=3, borderPoints=FALSE)

simbp
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 13 cluster(s) and 102 noise points.
## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13 
## 102   4   3   4   1   1   4   1   4   3   1   1   1   1 
## 
## Available fields: cluster, eps, minPts
clust.plot(simulated, simbp)
rglwidget()
wtdbbp
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 2 cluster(s) and 129 noise points.
## 
##   0   1   2 
## 129   1   1 
## 
## Available fields: cluster, eps, minPts
clust.plot(wt, wtdbbp)
rglwidget()
mutdbbp
## DBSCAN clustering for 66 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 1 cluster(s) and 61 noise points.
## 
##  0  1 
## 61  5 
## 
## Available fields: cluster, eps, minPts
clust.plot(mut, mutdbbp)
rglwidget()
rgl.close()

Excluding border points does not seem to provide any meaningful improvement in our analsis.

We see that the number of clusters detected is higher for simulated and wild-type data. The Nup132∆ mutant that is known to have clustered NPCs actually has fewer clusters ID’d. This suggests that the metric used to compare extent of clustering is not actually cluster number. Perhaps the fraction of NPCs in a cluster per nucleus is a meaningful metric. There is also a clear density-dependence even for randomly spaced NPCs - for example, at higher densities, there are more likely to be points within the eps radius and even if all of the points in that “cluster” are evenly spaced then it will be a larger cluster. Let’s see how this works with more simulated data.

#simulate same size sphere with 2x the number of points of earlier simulated data
sim2x <- icosa::rpsphere(n=250, radius=1.2)
sim2xdb <- dbscan(sim2x, eps=0.2, minPts=3)

#original
sim
## DBSCAN clustering for 131 objects.
## Parameters: eps = 0.3, minPts = 3
## The clustering contains 15 cluster(s) and 32 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 
## 32 14 12  4 11 11  5  4  3  6  6  4  5  3  3  8 
## 
## Available fields: cluster, eps, minPts
print(paste0("Number of Clusters: ", max(sim$cluster)))
## [1] "Number of Clusters: 15"
#doubled density
sim2xdb
## DBSCAN clustering for 250 objects.
## Parameters: eps = 0.2, minPts = 3
## The clustering contains 31 cluster(s) and 84 noise points.
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
## 84  5  6  4 10 24  7  4  6  6  6  8  3  8  7  3  3  7  5  3  3  5  3  3  3  4 
## 26 27 28 29 30 31 
##  5  3  3  3  3  3 
## 
## Available fields: cluster, eps, minPts
print(paste0("Number of Clusters: ", max(sim2xdb$cluster)))
## [1] "Number of Clusters: 31"
mfrow3d(nr=1, nc=2, sharedMouse=TRUE)
clust.plot(simulated, sim, main="131 points")
clust.plot(sim2x, sim2xdb, main="250 points")
rglwidget()

Fraction of NPCs in a cluster metric

What I think we want is what fraction of total NPCs are in a cluster. Get number of points with cluster not equal to 0 (meaning assigned to noise/no cluster by dbscan) in dbscan$cluster object. Divide this by total number of points to get fraction present in a cluster.

clust.frac <- function(x) {
  c <- as.vector(x$cluster)
  total <- length(c)
  clustd <- length(c[c>0])
  frac <- clustd/total
  data.frame(total, clustd, frac)
}

foo <- clust.frac(sim)
bar <- clust.frac(wtdb)
foobar <- clust.frac(mutdb)

print(paste0("Simulated Random: ", foo$frac))
## [1] "Simulated Random: 0.755725190839695"
print(paste0("WT: ", bar$frac))
## [1] "WT: 0.725190839694656"
print(paste0("Nup132∆: ", foobar$frac))
## [1] "Nup132∆: 0.409090909090909"

Simulated dataset using icosa

When simulations are done with icosa there is no variable to specify minimal point-point distance like there is in Jay’s ImageJ plugin. To get around this, we can simulate the data and then filter out any that are below the minimal distance detectable using our track max not mask plugin for SIM data (160nm).

simulate_data <- function(points, radius, num) {
  output <- list()
  for (i in 1:num) {
    x <- data.frame(icosa::rpsphere(n=points, radius=radius))
    d <- as.matrix(dist(x))
    d[d==0] <- NA
    min.d <- rowMins(d, na.rm=TRUE)
    x$min.dist <- min.d
    xf <- x %>% filter(min.dist >= 0.16)
    output[[i]] <- data_frame(xf)
  }
  return(output)
}

test <- simulate_data(150, 1.25, 100)

NPC.dbscan <- function(x) {
  d <- dbscan(x, eps=0.2, minPts=3)
  return(d)
}
  
testdb <- lapply(test, NPC.dbscan)
test.clustfrac <- lapply(testdb, clust.frac)
testfinal <- plyr::ldply(test.clustfrac)
testfinal %>% summarise_all(list(mean=mean, StDev=sd))
##   total_mean clustd_mean  frac_mean total_StDev clustd_StDev frac_StDev
## 1       81.7         3.1 0.03714374    8.074777     3.301362 0.03852862

Simulated with minimum distance threshold.

The inability of the icosa::rpsphere function to specify minimum distances results in a lot of points being removed using the minimum distance filter as shown above (i.e., even though 150 points simulated, mean number after distance filtering was only ~80). Intead, lets try to simulate with ImageJ plugin the random points on a sphere with a minimum distance of 100 nm (based on approximate minimum distance for touching NPCs), and then run those simluations through the NPC tracking plugin with the same minimum distance threshold as is used for the 3D SIM data (160 nm min seperation). In this way, we will be comparing clustering observed in data acquired/analyzed the same way.

Simulations were run one hundred times for a sphere with radius of 1200 nm and 125 points (based on mean values for Nsp1-GFP surface area and number of NPCs in mid to late G2 stage). The simulated images were than processed with track max not mask and the coordinates and pair correlation plots were saved.

Let’s read in the tracked coordinates and see how the data looks in terms of number of NPCs tracked vs. real (125).

simG2.files <- list.files(pattern = "\\.xls$", path = "./data/simulated_data/", recursive = FALSE, full.names = TRUE) #make a list of all of the files to be analyzed. This will only include files ending in xls
simG2.files <- mixedsort(simG2.files)
simG2.data <- lapply(simG2.files, fread, select = c("x", "y", "z"))
NPC_pix2microns <-function(dataset) {
  dataset %>% 
    mutate(x = x*0.0400015,
           y = y*0.0400015,
           z = z*0.125)
}

#Function to calcualte an object's sphericity from user-defined volume and surface area values
sphericity <- function(volume, area) {
  (pi^(1/3) * (6*volume)^(2/3))/area
}

##Custom function to extract surface area from 3D convexhull with error handling. 
surf.area <- function(x, cond="problem") {
  out <- tryCatch(
    {
      convhulln(x, options="FA")$area
    }, error = function(cond) {
      message("Error with convex hull computation")
      return(NA)
    }, warning = function(cond) {
      message("Warning during convex hull computation")
      return(NA)
    }
  )
  return(out)
}

##Custom function to extract volume from 3D convexhull with error handling. 
volume <- function(x, cond="problem") {
  out <- tryCatch(
    {
      convhulln(x, options="FA")$vol
    }, error = function(cond) {
      message("Error with convex hull computation")
      return(NA)
    }, warning = function(cond) {
      message("Warning during convex hull computation")
      return(NA)
    }
  )
  return(out)
}

#Function to calculate surface area, volume, sphericity and number of NPCs for dataframe of x/y/z NPC coordinates. Returns as a data frame. Uses convhulln function from 'geometry' package.
NPC.stats <-function(x) {
  num.NPCs <- nrow(x)
  SA <- surf.area(x)
  VOL <- volume(x)
  sphericity.x <- sphericity(VOL, SA)
  density <- num.NPCs/SA
  data.frame(SA, VOL, sphericity.x, num.NPCs, density)
}

#Function to optimize 3D convex hull for surface area and volume calculations. Allows removal of 10 percent of points to help detect and remove outliers from background.
#Not sure how this handles datasets that throw convhull errors. Can try to replace convhulln calls with surf.area and volume functions but then not sure how it will handle NA values.
hullopt <- function(data) {
    input <- data
    centroid <- data %>% 
      summarise_all(mean)
    #print(centroid)
    input$dist2cent <- as.numeric(dist.xyz(as.matrix(input), as.matrix(centroid))) #dist.xyz has to have matrices as input
    input <- input %>% arrange(desc(dist2cent))
    rows <- nrow(input)
    loop <- 1
    
    while (0.1*rows > loop) {
        i <- loop+1
        current <- input[loop:rows, ]
        currenthull <- convhulln(current[,1:3])
        currentrows <- nrow(current)
        currentpct <- length(unique(c(currenthull)))/currentrows
        currentSA <- convhulln(current[,1:3], options="FA")$area
        currentV <- convhulln(current[,1:3], options="FA")$vol
        currentsph <- sphericity(currentV, currentSA)
        test <- input[i:rows, ]
        testhull <- convhulln(test[,1:3])
        testrows <- nrow(test)
        testpct <- length(unique(c(testhull)))/testrows
        testSA <- convhulln(test, options="FA")$area
        testV <- convhulln(test, options="FA")$vol
        testsph <- sphericity(currentV, currentSA)
        
        loop <- loop + 1
        #print(paste(loop-1, currentpct, testpct, currentsph, testsph))
        
        if(testpct < currentpct) {
            loop <- loop-1
            break
        }
    }
    
    output <- input[loop:rows, 1:3]
    optinfo <- loop-1
    return(list("info" = optinfo, "output" = output))
    
}
names(simG2.data) <- gsub("\\.xls$", "", simG2.files) #remove the .xls from the names of the files
simG2.convert <- lapply(simG2.data, NPC_pix2microns) #convert the coordinates from pixels into micron space using custom function 'NPC_pix2microns', in which pixel/slice size is provided
simG2.convert <- lapply(simG2.convert, hullopt) #run converted x/y/z coordinates for NPCs through hull optimization function to remove any outlier spots prior to surface area/volume calculations.
simG2.converted <-  map(simG2.convert, 2) #extract the dataframe objects containing filtered x/y/z data sets for NPC stats analysis later
removed <- map(simG2.convert, 1) #make list of dbls containing number of points removed during hull optimization to be added to final results table later
removed <- unlist(removed) #convert list to a dataframe so can cbind later
simG2.results <- lapply(simG2.converted, NPC.stats) #calculate NPC number, NE surface area/volume, NPC density, sphericity using the custom 'NPC.stats' function on the data converted to microns
simG2results.combined <- plyr::ldply(simG2.results, data.frame) #convert list of NPC stats to a single data frame with one row per file/image analyzed

simG2results.combined %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% kable(.)
SA VOL sphericity.x num.NPCs density
17.36542 6.610894 0.980913 97.48 5.612595

So, we see that only ~97 NPCs out of 125 simulated are returned after tracking in ImageJ. The true surface area is 18.1 square microns, and computed is 17.36. True density is 6.9 NPCs/micron squared, while observed density is 5.6 NPCs/micron squared. This represents ~19 percent of NPCs that are lost during analysis.

Regardless of the loss of some NPCs, let’s see how the fraction clustered looks in the simulated random data.

simG2db <- lapply(simG2.converted, NPC.dbscan)
simG2.clustfrac <- lapply(simG2db, clust.frac)
simG2final <- plyr::ldply(simG2.clustfrac)
simG2final %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% kable(.)
total clustd frac
97.48 1.01 0.0103293

We see that on average, only 1% of simulated random NPCs are observed to be in clusters defined by epsilon value of 200nm radius and minimum number of points as 3.

Test this in batch with real data sets.

Let’s see how the fraction of clustered NPCs metric behaves on 3D-SIM image datasets.

Analysis of Wild-type Nsp1-GFP SIM data.

wt.files <- list.files(pattern = "\\.xls$", path = "./data/wt_data/", recursive = FALSE, full.names = TRUE) #make a list of all of the files to be analyzed. This will only include files ending in xls
wt.files <- mixedsort(wt.files)
wt.data <- lapply(wt.files, fread, select = c("x", "y", "z"))
names(wt.data) <- gsub("\\.xls$", "", wt.files) #remove the .xls from the names of the files
wt.convert <- lapply(wt.data, NPC_pix2microns) #convert the coordinates from pixels into micron space using custom function 'NPC_pix2microns', in which pixel/slice size is provided
wt.convert <- lapply(wt.convert, hullopt) #run converted x/y/z coordinates for NPCs through hull optimization function to remove any outlier spots prior to surface area/volume calculations.
wt.converted <-  map(wt.convert, 2) #extract the dataframe objects containing filtered x/y/z data sets for NPC stats analysis later
wt.removed <- map(wt.convert, 1) #make list of dbls containing number of points removed during hull optimization to be added to final results table later
wt.removed <- unlist(wt.removed) #convert list to a dataframe so can cbind later
wt.results <- lapply(wt.converted, NPC.stats) #calculate NPC number, NE surface area/volume, NPC density, sphericity using the custom 'NPC.stats' function on the data converted to microns
wtresults.combined <- plyr::ldply(wt.results, data.frame) #convert list of NPC stats to a single data frame with one row per file/image analyzed
wtresults.combined$Points.Removed <- wt.removed #add number of points removed column

wtresults.combined %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% kable(.)
SA VOL sphericity.x num.NPCs density Points.Removed
21.28076 8.360911 0.9253715 108.3051 5.102038 0.8940678
wtdb <- lapply(wt.converted, NPC.dbscan)
wt.clustfrac <- lapply(wtdb, clust.frac)
wtfinal <- plyr::ldply(wt.clustfrac)
wtfinal %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% kable(.)
total clustd frac
108.3051 2.555085 0.0230172

We can see a few take away points:

  • On average, the NPC density is slightly lower than those seen in the simulated data.
  • The fraction of NPCs in clusters is 2.3 times higher than simulated random (2.3% vs 1%)

Analysis of Nup132∆ Nsp1-mCh SIM data.

mut.files <- list.files(pattern = "\\.xls$", path = "./data/mut_data/", recursive = FALSE, full.names = TRUE) #make a list of all of the files to be analyzed. This will only include files ending in xls
mut.files <- mixedsort(mut.files)
mut.data <- lapply(mut.files, fread, select = c("x", "y", "z"))
names(mut.data) <- gsub("\\.xls$", "", mut.files) #remove the .xls from the names of the files
mut.convert <- lapply(mut.data, NPC_pix2microns) #convert the coordinates from pixels into micron space using custom function 'NPC_pix2microns', in which pixel/slice size is provided
mut.convert <- lapply(mut.convert, hullopt) #run converted x/y/z coordinates for NPCs through hull optimization function to remove any outlier spots prior to surface area/volume calculations.
mut.converted <-  map(mut.convert, 2) #extract the dataframe objects containing filtered x/y/z data sets for NPC stats analysis later
mut.removed <- map(mut.convert, 1) #make list of dbls containing number of points removed during hull optimization to be added to final results table later
mut.removed <- unlist(mut.removed) #convert list to a dataframe so can cbind later
mut.results <- lapply(mut.converted, NPC.stats) #calculate NPC number, NE surface area/volume, NPC density, sphericity using the custom 'NPC.stats' function on the data converted to microns
mutresults.combined <- plyr::ldply(mut.results, data.frame) #convert list of NPC stats to a single data frame with one row per file/image analyzed
mutresults.combined$Points.Removed <- mut.removed #add number of points removed column

mutresults.combined %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% kable(.)
SA VOL sphericity.x num.NPCs density Points.Removed
20.30144 7.764576 0.9175474 74.04011 3.719701 0.4304813
mutdb <- lapply(mut.converted, NPC.dbscan)
mut.clustfrac <- lapply(mutdb, clust.frac)
mutfinal <- plyr::ldply(mut.clustfrac)
mutfinal %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% kable(.)
total clustd frac
74.04011 2.518717 0.0337895
  • The major difference in this data set is that there are significantly fewer NPCs detected. This is likely an artifact of the clustering - if they are clustered close together, then the track max not mask will remove more than one NPC during the masking step.

  • The fraction of clustered NPCs increases to 3.4% of all NPCs. It is nice to see this increase, however the magnitude of clustering seems very small. Perhaps we should try to subset out nuclei from the Nup132∆ data sets that have clear, strong clustering and see how the fraction clustered looks in these? There were clearly a majority of nuclei that did not have strong clustering even when we used the proximity ratio/pair correlation function approach, so these could be weighing down the average for the whole data set.

Comparison between datasets

simG2final$Strain <- c("Random")
wtfinal$Strain <- c("WT")
mutfinal$Strain <- c("Nup132∆")

ls <- list(simG2final, wtfinal, mutfinal)
df <- do.call("rbind", ls)
df <- df %>%
  dplyr::rename(Fraction.Clustered = frac) %>% 
  select(Fraction.Clustered, Strain)

df$Strain <- factor(df$Strain, levels=c("Random", "WT", "Nup132∆"))

df.melt <- reshape2::melt(df, value="Fraction.Clustered")

comparisons <- list(c("Random", "WT"), c("Random", "Nup132∆"), c("WT", "Nup132∆"))


ggplot(df, aes(x=Strain, y=Fraction.Clustered)) +
  geom_sina() +
  ylim(0,0.3) +
  ggpubr::stat_compare_means(method="kruskal.test", label.y=0.27) +
  ggpubr::stat_compare_means(comparisons = list(c("Random", "WT"), c("Random", "Nup132∆"), c("WT", "Nup132∆")), method = "wilcox.test", paired=FALSE, label.y = c(.15, .2, .25)) +
  theme_cowplot()

Conclusion

The DBSCAN analysis seems to work and is easily added into the analysis workflow. Will go ahead and move forward with getting this worked in, keeping the proximity ratio measurements from the pair correlation functions so we can compare those two methods for looking at clustering.